Goto

Collaborating Authors

 current method


NovoBench: Benchmarking Deep Learning-based \emph{De Novo} Sequencing Methods in Proteomics

Neural Information Processing Systems

Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further research of this important task. Firstly, since there is no consensus for the evaluation datasets, the empirical results in different research papers are often not comparable, leading to unfair comparison. Secondly, the current methods are usually limited to amino acid-level or peptide-level precision and recall metrics.


A Survey of LLM $\times$ DATA

arXiv.org Artificial Intelligence

The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.


NovoBench: Benchmarking Deep Learning-based \emph{De Novo} Sequencing Methods in Proteomics

Neural Information Processing Systems

Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further research of this important task. Firstly, since there is no consensus for the evaluation datasets, the empirical results in different research papers are often not comparable, leading to unfair comparison. Secondly, the current methods are usually limited to amino acid-level or peptide-level precision and recall metrics.


Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

arXiv.org Artificial Intelligence

Predictive maintenance is a well studied collection of techniques that aims to prolong the life of a mechanical system by using artificial intelligence and machine learning to predict the optimal time to perform maintenance. The methods allow maintainers of systems and hardware to reduce financial and time costs of upkeep. As these methods are adopted for more serious and potentially life-threatening applications, the human operators need trust the predictive system. This attracts the field of Explainable AI (XAI) to introduce explainability and interpretability into the predictive system. XAI brings methods to the field of predictive maintenance that can amplify trust in the users while maintaining well-performing systems. This survey on explainable predictive maintenance (XPM) discusses and presents the current methods of XAI as applied to predictive maintenance while following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines. We categorize the different XPM methods into groups that follow the XAI literature. Additionally, we include current challenges and a discussion on future research directions in XPM.


Explainable AI for Earth Observation: Current Methods, Open Challenges, and Opportunities

arXiv.org Artificial Intelligence

Deep learning has taken by storm all fields involved in data analysis, including remote sensing for Earth observation. However, despite significant advances in terms of performance, its lack of explainability and interpretability, inherent to neural networks in general since their inception, remains a major source of criticism. Hence it comes as no surprise that the expansion of deep learning methods in remote sensing is being accompanied by increasingly intensive efforts oriented towards addressing this drawback through the exploration of a wide spectrum of Explainable Artificial Intelligence techniques. This chapter, organized according to prominent Earth observation application fields, presents a panorama of the state-of-the-art in explainable remote sensing image analysis.


AI 'can accurately spot cancer': Algorithm better at spotting cancerous nodules than other methods

Daily Mail - Science & tech

A new artificial intelligence tool can accurately identify cancer in a development doctors and scientists said could speed up diagnosis of the disease. The algorithm performs more effectively than current methods, according to a study. It can identify whether abnormal growths found on CT scans are cancerous. The AI tool, designed by experts at the Royal Marsden NHS foundation trust, the Institute of Cancer Research, London, and Imperial College London, could fast-track patients to treatment. A new artificial intelligence tool can accurately identify cancer in a development doctors and scientists said could speed up diagnosis of the disease.


Explainability in Deep Reinforcement Learning, a Review into Current Methods and Applications

arXiv.org Artificial Intelligence

Tasks such as weather simulation, medical diagnosis, business optimisation and automation like autonomous cars have benefited from these new Artificial Intelligence (AI) methods. Some of these ML models are used in ways that their predictions can affect people's safety or commercial success. These models must be considered trustworthy with errors detected and dealt with before they can affect the success or safety of the process being controlled. Neural Networks (NNs), and in particular Deep Neural Networks (DNNs), represent one such class of ML algorithm. Due to the nature of DNNs, the decisions they produce can seem arbitrary. These DNNs are comprised of thousands of nodes that perform mathematical operations, creating a "black-box like" system, in which one is unable to judge the decisions being made by simple inspection.


DeepMind AI predicts incoming rainfall with high accuracy

#artificialintelligence

Having flexed its muscles in predicting kidney injury, toppling Go champions and solving 50-year-old science problems, artificial intelligence company DeepMind is now dipping its toes in weather forecasting. The company's latest tool is designed to predict oncoming precipitation through what's known as nowcasting, and the vast majority of meteorologists found it to be more accurate than current methods in early testing. The science of precipitation nowcasting focuses on predicting rain within the next one to two hours, and is of real importance in areas such as outdoor events, aviation and emergency planning. DeepMind set out to develop a machine-learning tool that can bring a new level of precision to these efforts, by making use of high-precision radar data that tracks precipitation every five minutes at a 1-km (0.62-mile) resolution. It did so by using a generative modeling approach, which analyzes the past 20 minutes of observed radar and then makes predictions for the upcoming 90 minutes.


Machine learning is paving the way towards 3D X-rays

#artificialintelligence

Researchers at the U.S. Department of Energy's (DOE) Argonne National Laboratory have developed a new AI-based framework that can produce X-ray images in 3D. The team, which includes members from three divisions at Argonne, has developed a method to create 3D visualizations from X-ray data. Their efforts were meant to allow them to better use the Advanced Photon Source (APS) at their lab, but potential applications of this technology range from astronomy to electron microscopy. Lab tests showed that the new approach, called 3D-CDI-NN, can create 3D visualizations from data hundreds of times faster than existing technology. "In order to make full use of what the upgraded APS will be capable of, we have to reinvent data analytics. Our current methods are not enough to keep up. Machine learning can make full use and go beyond what is currently possible," says Mathew Cherukara of the Argonne National Laboratory, corresponding author of the paper.


Helicopter Track Identification with Autoencoder

arXiv.org Artificial Intelligence

Computing power, big data, and advancement of algorithms have led to a renewed interest in artificial intelligence (AI), especially in deep learning (DL). The success of DL largely lies on data representation because different representations can indicate to a degree the different explanatory factors of variation behind the data. In the last few year, the most successful story in DL is supervised learning. However, to apply supervised learning, one challenge is that data labels are expensive to get, noisy, or only partially available. With consideration that we human beings learn in an unsupervised way; self-supervised learning methods have garnered a lot of attention recently. A dominant force in self-supervised learning is the autoencoder, which has multiple uses (e.g., data representation, anomaly detection, denoise). This research explored the application of an autoencoder to learn effective data representation of helicopter flight track data, and then to support helicopter track identification. Our testing results are promising. For example, at Phoenix Deer Valley (DVT) airport, where 70% of recorded flight tracks have missing aircraft types, the autoencoder can help to identify twenty-two times more helicopters than otherwise detectable using rule-based methods; for Grand Canyon West Airport (1G4) airport, the autoencoder can identify thirteen times more helicopters than a current rule-based approach. Our approach can also identify mislabeled aircraft types in the flight track data and find true types for records with pseudo aircraft type labels such as HELO. With improved labelling, studies using these data sets can produce more reliable results.